Accurate spatial-temporal traffic flow forecasting is essential for helping traffic managers to take control measures and drivers to choose the optimal travel routes. Recently, graph convolutional networks (GCNs) have been widely used in traffic flow prediction owing to their powerful ability to capture spatial-temporal dependencies. The design of the spatial-temporal graph adjacency matrix is a key to the success of GCNs, and it is still an open question. This paper proposes reconstructing the binary adjacency matrix via tensor decomposition, and a traffic flow forecasting method is proposed. First, we reformulate the spatial-temporal fusion graph adjacency matrix into a three-way adjacency tensor. Then, we reconstructed the adjacency tensor via Tucker decomposition, wherein more informative and global spatial-temporal dependencies are encoded. Finally, a Spatial-temporal Synchronous Graph Convolutional module for localized spatial-temporal correlations learning and a Dilated Convolution module for global correlations learning are assembled to aggregate and learn the comprehensive spatial-temporal dependencies of the road network. Experimental results on four open-access datasets demonstrate that the proposed model outperforms state-of-the-art approaches in terms of the prediction performance and computational cost.
translated by 谷歌翻译
Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power of neural rendering. It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion. Although with the facial prior preserved in pre-trained 3D GANs, reconstructing a 3D portrait with only one monocular image is still an ill-pose problem. The straightforward application of 2D GAN inversion methods focuses on texture similarity only while ignoring the correctness of 3D geometry shapes. It may raise geometry collapse effects, especially when reconstructing a side face under an extreme pose. Besides, the synthetic results in novel views are prone to be blurry. In this work, we propose a novel method to promote 3D GAN inversion by introducing facial symmetry prior. We design a pipeline and constraints to make full use of the pseudo auxiliary view obtained via image flipping, which helps obtain a robust and reasonable geometry shape during the inversion process. To enhance texture fidelity in unobserved viewpoints, pseudo labels from depth-guided 3D warping can provide extra supervision. We design constraints aimed at filtering out conflict areas for optimization in asymmetric situations. Comprehensive quantitative and qualitative evaluations on image reconstruction and editing demonstrate the superiority of our method.
translated by 谷歌翻译
To enable the pre-trained models to be fine-tuned with local data on edge devices without sharing data with the cloud, we design an efficient split fine-tuning (SFT) framework for edge and cloud collaborative learning. We propose three novel techniques in this framework. First, we propose a matrix decomposition-based method to compress the intermediate output of a neural network to reduce the communication volume between the edge device and the cloud server. Second, we eliminate particular links in the model without affecting the convergence performance in fine-tuning. Third, we implement our system atop PyTorch to allow users to easily extend their existing training scripts to enjoy the efficient edge and cloud collaborative learning. Experiments results on 9 NLP datasets show that our framework can reduce the communication traffic by 96 times with little impact on the model accuracy.
translated by 谷歌翻译
弱监督的对象本地化(WSOL)旨在学习仅使用图像级类别标签编码对象位置的表示形式。但是,许多物体可以在不同水平的粒度标记。它是动物,鸟还是大角的猫头鹰?我们应该使用哪些图像级标签?在本文中,我们研究了标签粒度在WSOL中的作用。为了促进这项调查,我们引入了Inatloc500,这是一个新的用于WSOL的大规模细粒基准数据集。令人惊讶的是,我们发现选择正确的训练标签粒度比选择最佳的WSOL算法提供了更大的性能。我们还表明,更改标签粒度可以显着提高数据效率。
translated by 谷歌翻译
半监督的对象检测在平均教师驱动的自我训练的发展中取得了重大进展。尽管结果有令人鼓舞,但在先前的工作中尚未完全探索标签不匹配问题,从而导致自训练期间严重确认偏见。在本文中,我们从两个不同但互补的角度(即分布级别和实例级别)提出了一个简单而有效的标签框架。对于前者,根据Monte Carlo采样,可以合理地近似来自标记数据的未标记数据的类分布。在这种弱监督提示的指导下,我们引入了一个重新分配卑鄙的老师,该老师利用自适应标签 - 分布意识到的信心阈值来生成无偏见的伪标签来推动学生学习。对于后一个,存在着跨教师模型的被忽视的标签分配歧义问题。为了解决这个问题,我们提出了一种新的标签分配机制,用于自我训练框架,即提案自我分配,该机制将学生的建议注入教师,并生成准确的伪标签,以相应地匹配学生模型中的每个建议。 MS-Coco和Pascal-VOC数据集的实验证明了我们提出的框架与其他最先进的框架相当优越。代码将在https://github.com/hikvision-research/ssod上找到。
translated by 谷歌翻译
最近几天,流媒体技术极大地促进了直播领域的发展。由于直播记录的长度过多,因此提取突出显示细分市场至关重要,以有效地生殖和重新分布。尽管事实证明,有很多方法可以有效地检测其他模式,但直播处理中存在的挑战,例如极端持续时间,大主题转移,无关紧要的信息等等,因此严重阻碍了这些这些的适应性和兼容性方法。在本文中,我们制定了一个新的任务直播突出显示检测,讨论和分析上面列出的困难,并提出了一种新的建筑抗议,以解决此问题。具体而言,我们首先将原始数据编码为多个视图,并对其时间关系进行建模,以捕获层次注意机制中的线索。之后,我们尝试将突出显示剪辑的检测转换为搜索最佳决策序列的搜索,并使用完全集成的表示形式来预测动态编程机制中的最终结果。此外,我们构建了一个完全注重的数据集Anthighlight,以实例化此任务并评估模型的性能。广泛的实验表明我们提出的方法的有效性和有效性。
translated by 谷歌翻译
大脑网络将大脑区域之间的复杂连接性描述为图形结构,这为研究脑连接素提供了强大的手段。近年来,图形神经网络已成为使用结构化数据的普遍学习范式。但是,由于数据获取的成本相对较高,大多数大脑网络数据集的样本量受到限制,这阻碍了足够的培训中的深度学习模型。受元学习的启发,该论文以有限的培训示例快速学习新概念,研究了在跨数据库中分析脑连接组的数据有效培训策略。具体而言,我们建议在大型样本大小的数据集上进行元训练模型,并将知识转移到小数据集中。此外,我们还探索了两种面向脑网络的设计,包括Atlas转换和自适应任务重新启动。与其他训练前策略相比,我们的基于元学习的方法实现了更高和稳定的性能,这证明了我们提出的解决方案的有效性。该框架还能够以数据驱动的方式获得有关数据集和疾病之间相似之处的新见解。
translated by 谷歌翻译
Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also face some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate augmented data into models by randomly masking parts of the input. However, some open problems of random dropping on GNNs remain to be solved. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, augmented data introduced to GNNs causes the incomplete coverage of parameters and unstable training process. Third, there is no theoretical analysis on the effectiveness of random dropping methods on GNNs. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the propagated messages during the message-passing process. More importantly, we find that DropMessage provides a unified framework for most existing random dropping methods, based on which we give theoretical analysis of their effectiveness. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, enabling it become a theoretical upper bound of other methods. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has the advantages of both effectiveness and generalization, and can significantly alleviate the problems mentioned above.
translated by 谷歌翻译
The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/bytedance/LargeBatchCTR.
translated by 谷歌翻译
Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their superior performance in many fields, there has not yet been a systematic study of how to design effective GNNs for brain network analysis. To bridge this gap, we present BrainGB, a benchmark for brain network analysis with GNNs. BrainGB standardizes the process by (1) summarizing brain network construction pipelines for both functional and structural neuroimaging modalities and (2) modularizing the implementation of GNN designs. We conduct extensive experiments on datasets across cohorts and modalities and recommend a set of general recipes for effective GNN designs on brain networks. To support open and reproducible research on GNN-based brain network analysis, we host the BrainGB website at https://braingb.us with models, tutorials, examples, as well as an out-of-box Python package. We hope that this work will provide useful empirical evidence and offer insights for future research in this novel and promising direction.
translated by 谷歌翻译